Proteins: Structure, Function, and Bioinformatics — Latest Matching Preprints

1

A Comprehensive Evaluation of Protein Structure Prediction Models for Short Peptides

Ghosh, B.; MUKHERJEE, A.

2026-07-03 biophysics 10.64898/2026.07.02.736085 medRxiv

Top 0.1%

9.8%

Show abstract

Short peptides pose distinct challenges for computational structural biology due to their lack of stable tertiary structures, high conformational flexibility, and limited evolutionary signals. To address how modern deep-learning architectures navigate these challenges, we conducted a comprehensive benchmarking of five state-of-the-art protein structure prediction models: AlphaFold2, RoseTTAFold2, ESMFold, OmegaFold, and DMPfold2. Using a curated dataset of experimentally determined short peptide structures (10-49 amino acids) from the Protein Data Bank, we systematically evaluated predictive performance across varying sequence lengths and secondary structure classes. Our results demonstrate that prediction accuracy systematically improves with peptide length. Furthermore, all models perform significantly better on -helical and mixed-structure peptides compared to {beta}-sheet-rich and intrinsically disordered sequences. Among the evaluated methods, AlphaFold2 and the single-sequence language models, ESMFold and Omegafold proved to be the most consistent and accurate overall. We also observed that internal model confidence scores are imperfectly calibrated for short peptides, necessitating cautious interpretation. Finally, by extending our analysis to the dbAMP3 dataset of uncharacterized antimicrobial peptides, we demonstrate that a multi-model consensus approach provides a rational framework for identifying robust structural hypotheses in the absence of experimental reference structures.

2

BioMetAll v2.0: Introducing Scores, Metal Discrimination, and Side-Chain Descriptors for Predicting Metal-Binding Sites in Proteins.

Marechal, J. D.; Fernandez Diaz, R.; Pena Losada, R.; Sanchez Aparicio, J. E.; Gao, W.; Alemany, M.

2026-07-12 bioinformatics 10.64898/2026.07.09.737562 medRxiv

Top 0.1%

7.7%

Show abstract

Predicting the location of metal-binding sites in proteins is crucial for fundamental biological questions and biotechnological applications. Over the past decade, the rise in metal-bound protein structures in the Protein Data Bank, combined with advanced statistical models such as deep learning, has accelerated the development of metal-binding site prediction tools. Several approaches are now available, offering high-quality benchmarks and predictive performance. Our initial development in this area is BioMetAll, whose first version was based on backbone pre-organization. Here, we introduce its second version, featuring two major updates: 1) metal-specific scoring functions and 2) prediction using backbone geometry alone or in combination with first coordination sphere descriptors. Apart from demonstrating metal sensitivity and yielding better benchmarking results, this new version allows the assessment of the influence of considering the metals first coordination sphere versus backbone pre-organization on how metallic species bind to proteins.

3

Location dependence of protein intrinsic disorder in Drosophila melanogaster

Abdulla Daanaa, H. S.; Kuraku, S.; Akashi, H.; Saito, K.

2026-07-03 bioinformatics 10.64898/2026.07.02.732782 medRxiv

Top 0.1%

7.2%

Show abstract

The relevance of protein structural flexibility in function remains contested, but experimental and computational evidence continues to accumulate. Many efforts to address this investigate intrinsic disorder, which commonly refers to peptide segments or entire protein sequences that presumably lack structure and exhibit high flexibility/conformational heterogeneity under physiological conditions. These efforts face challenges such as conflicting computational predictions and ambiguous relationships among intrinsic disorder locations and other protein properties. We address these challenges at a genome-wide scale in Drosophila melanogaster using residue-level predictions for various protein properties. We employ single and consensus approaches to quantify the prevalence of intrinsic disorder and attempt to infer function by testing for differences along protein sequences. Intrinsic disorder is likely more common at terminals than internal regions, and amino acid frequencies can vary substantially between regions in a manner that plausibly reflects functions of intrinsic disorder, rather than only proteome-wide effects. Tertiary structure potentially underlies the prevalence of intrinsic disorder along protein sequences; this prevalence varies more in a putatively solvent-exposed context than a solvent-buried one. Protein-binding appears to be a main function of intrinsic disorder, and we find support consistent with the notion that structural flexibility fosters binding plasticity, and show that location and protein length are factors in this relationship. Nucleic acid-binding and linker are ostensibly less common disorder functions than protein-binding, but nucleic acid-binding seems more localized at terminals. Residue-level estimates of selection pressure indicate that disordered regions generally evolve under weaker sequence constraints than structured regions, except at the N-terminal region. Biases in disorder prediction are a considerable factor in many of the observations, but unlikely a full explanation. The findings strengthen support for functional relevance of flexibility, offer insight into protein architecture and function, and lend impetus for experimental inquiry.

4

The Gompertz curve for estimating growth rates of Protein Data Bank and protein folds

Sato, K.; TOMII, K.

2026-06-26 bioinformatics 10.64898/2026.06.24.732253 medRxiv

Top 0.1%

6.8%

Show abstract

The Protein Data Bank (PDB) is an ever-growing, open-access repository of structural data of biological molecules. This international database has been instrumental in the development of artificial intelligence and deep learning models for protein structure prediction and design. The PDB growth is a crucially important factor influencing further development of these models. Therefore, after analyzing the growth trend in PDB depositions since the archive's launch, we found that it is well fitted by the Gompertz function, a growth curve used across various disciplines. Furthermore, we observed that the function captures the "discovery of novel folds", i.e., the cumulative number of distinct folds among protein domains that constitute most of the PDB. Consequently, based on the fitting results, we estimated the likely numbers of PDB entries and protein folds. These findings provide insights into deceleration of growth in recent years and enable us to assess anticipated trends.

5

The Hidden Disorder Divide: Reconciling Benchmark Inconsistencies in Intrinsically Disordered Protein Binding Site Prediction

Malhis, N.; Mehdiabadi, M.; Erdos, G.; Gsponer, J.; Kurgan, L.; Tosatto, S. C. E.; Dosztanyi, Z.; Piovesan, D.

2026-06-27 bioinformatics 10.64898/2026.06.24.733783 medRxiv

Top 0.2%

5.6%

Show abstract

Computational predictors of protein-binding sites within intrinsically disordered regions (IDRs) show highly inconsistent performance across high-quality benchmark datasets. To understand the origins of these discrepancies, we systematically compared predictors across three independent test sets: two CAID datasets updated with the latest DisProt annotations and a composite dataset (DBs) assembled from DIBS, FuzDB, IDEAL, and MFIB. Predictors trained predominantly on DisProt data achieved substantially higher AUCs on the CAID sets but performed poorly on the DBs. In contrast, predictors trained on older, low-quality PDB-based datasets showed balanced performance across all sets, with a slight preference for DBs. Predictors with mixed training exposure displayed intermediate behavior. Through controlled experiments using identical CNN architectures and feature analysis, we demonstrate that the dominant factor driving these performance differences is the intrinsic disorder propensity of the binding sites themselves. Binding residues in DisProt-based datasets exhibit markedly higher average disorder propensity scores than those in PDB-derived datasets. This previously unrecognized selection bias -- literature studies preferentially characterizing more disordered binding sites, while PDB-derived annotations capture less disordered ones -- effectively splits IDR-protein binding sites into two distinct categories. Predictors optimized on one category therefore generalize poorly to the other. Binding-site length and sequence conservation play only minor or negligible roles in explaining the observed inconsistencies. These findings highlight a critical limitation in current benchmarking practices and training strategies for IDR-binding site prediction, underscoring the need for more balanced and disorder-aware reference datasets. Finally, the diagnostic techniques introduced here could prove valuable beyond the specific application examined in this study.

6

Direct Binding of Cysteine-367 Thiolate to the Active Site of the -Hydrogenase from Clostridium beijerinckii in the O2-stable State

Duan, J.; Arrigoni, F.; Rutz, A.; Hofmann, E.; Greco, C.; Happe, T.

2026-07-13 biochemistry 10.64898/2026.07.11.737921 medRxiv

Top 0.2%

5.4%

Show abstract

[FeFe]-hydrogenases are very active biocatalysts for H2 conversion. However, their active site is vulnerable to irreversible degradation initiated by O2 binding at the catalytic iron ion (Fed) of the active center. CbA5H, the [FeFe]-hydrogenases from Clostridium beijerinckii exhibits stability towards oxygen (O2) due to its ability to reversibly enter an inactive state termed Hinact upon contact with O2. We previously proposed that the close distance of approximately 3.1 [A] between the thiol of a nearby cysteine (C367) and the Fed, based on a 2.9 [A] crystal structure of CbA5H in the Hinact state, enables their binding to each other. This binding therefore was suggested to shield the Fed from O2 damage. However, there is currently a lack of evidence to support this hypothesis. Furthermore, density functional theory (DFT) calculations based on a homologous model favored hydroxide as the binding ligand of the Fed over the thiol of C367. In this study, we present the crystal structure of CbA5H in the Hinact state at an improved resolution of 2.15 [A]. The structure reveals a direct binding between the thiol of C367 and the Fed with a distance of approximated 2.77 [A] which is well supported by our DFT calculations based on the new crystallographic data. It is noteworthy that the 2.77 [A] bond distance is strikingly long when compared with other iron-sulfur bonds. This finding may provide a crucial foundation for understanding the rapid reversibility of the Hinact state.

7

Benchmarking AI Protein Structure Predictors Reveals a Persistent Bias in Multi-State Proteins

Ye, M.; Wang, Y.-H.; Brogi, M.; Parks, J. M.; Kuo, K. M.; Gumbart, J. C.

2026-07-11 biophysics 10.64898/2026.07.10.737860 medRxiv

Top 0.2%

5.2%

Show abstract

Protein structure predictors achieve high single-state accuracy, but it remains unclear whether they can recover functionally relevant conformational ensembles or account for the presence of ligands and/or binding partners. Here, we benchmark AlphaFold3, Boltz-2, Chai-1, and BioEmu on four canonical multi-state proteins (Pf-MATE, LAO, SecA, and {beta}2AR), quantifying state bias and sampling breadth against experimental reference structures. Models frequently default to a dominant state represented in the PDB; small-molecule ligands have weak or inconsistent effects, while large protein partners drive clear conformational switching between states. Multiple sequence alignment (MSA)-based approaches (AF-Cluster and random subsampling) recapitulate similar biases, indicating that this behavior is not unique to newer architectures. These results underscore current limitations for multi-state protein structure prediction and structure-guided ligand discovery. TOC Graphic O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=111 SRC="FIGDIR/small/737860v1_ufig1.gif" ALT="Figure 1"> View larger version (12K): org.highwire.dtl.DTLVardef@3bf389org.highwire.dtl.DTLVardef@1f1c436org.highwire.dtl.DTLVardef@188ea8aorg.highwire.dtl.DTLVardef@1de236e_HPS_FORMAT_FIGEXP M_FIG C_FIG

8

Computationally engineered cyclic peptides reduce prion levels in vitro

Paspali, E.; Oueslati Morales, C. O.; de Raffele, D.; Aguzzi, A.; Caflisch, A.; Hornemann, S.; Ilie, I. M.

2026-07-07 biophysics 10.64898/2026.07.03.736251 medRxiv

Top 0.2%

4.8%

Show abstract

Prion diseases are neurodegenerative disorders associated with the structural conversion of the cellular prion protein (PrPc) into its misfolded infectious isoform (PrPSc). Despite substantial efforts, no disease-modifying therapy or cure is currently available. Here, we present an integrated computational-experimental pipeline for the rational design of cyclic peptides targeting PrPc to inhibit its pathogenic conversion. Starting from crystal structures of antibody-bound mouse PrPc, we develop a rational design strategy combined with iterative molecular dynamics simulations and sequence optimization to generate peptides with enhanced binding and structural impact. Three candidates were selected for experimental validation. Our results show that PH1 (49YGPDPSDSYT58, antibody numbering) that binds stably to the &alpha2-&alpha3 interface most effectively reduced PrPSc levels in GT1-7 cells, essentially by inducing allosteric rearrangements that reinforce the intramolecular helical bundle. PL1 (89GQSNTKPYT97) and PL2 (89RQSNTWPYT97) binding the &beta1-&alpha1/&alpha3 junction exerted more modest effects due to the potential competition of the flexible tail to bind at this site. These results establish a mechanistic link between peptide-induced stabilization of PrPc and inhibition of prion propagation and provide a generalizable framework for designing conformational stabilizers of aggregation-prone proteins.

9

Structural Determinants of Catalytic Directionality in an AMP-Forming Acetyl-CoA Synthetase from Syntrophus aciditrophicus

Yaghoubi, S.; Dinh, D. M.; Thomas, L. M.; Wofford, N. Q.; McInerney, M. J.; Follmer, A. H.; Karr, E. A.

2026-07-07 biochemistry 10.64898/2026.07.06.736832 medRxiv

Top 0.3%

4.4%

Show abstract

Acetyl-coenzyme A (CoA) is a central metabolic intermediate that links carbon and energy metabolism across all domains of life. The conversion of acetate and acetyl-CoA is carried out by three enzyme pathways: acetate kinase/phosphotransacetylase, ADP-forming acetyl-CoA synthetase, and AMP-forming acetyl-CoA synthetase (Acs). Acs enzymes serve critical physiological roles across diverse organisms generally by catalyzing a reversible two-step reaction forming acetyl-CoA and AMP from acetate and ATP. Isolated from the wastewater reclamation facility in Norman, Oklahoma, Syntrophus aciditrophicus strain SB (Sa) relies on an AMP-forming acetyl-CoA synthetase (SaAcs1) that favors synthesizing acetate and ATP from acetyl-CoA and AMP, in contrast to all previously characterized Acs enzymes. The origin of this preference and the structural determinants of both the thioester-forming step and catalytic directionality remain poorly understood. Here, we report a 2.2 [A] crystal structure of full-length SaAcs1 in the adenylation conformation with acetyl-AMP bound in the active site. Structural comparison to the extensively characterized Acs enzymes from Salmonella enterica (SeAcs) and Cryptococcus neoformans (CnAcs) revealed a displaced CoA-binding loop in SaAcs1. Enzymatic assays confirmed that SaAcs1 preferentially catalyzes the ATP-forming reaction. Site-directed mutagenesis demonstrated that reversion of two residues, G196 and T197, at the beginning of the CoA-binding loop to the consensus sequence repositions the loop and shifts catalytic preference toward the AMP-forming direction. Together, these results establish the CoA-binding loop and G196 and T197 as the primary structural determinants of directional preference in SaAcs1.

10

RNArefine: AI-guided Atomic-Level Refinement of RNA Structures

Tsukiyama, S.; Li, Y.; Sato, K.; Kurata, H.; Zhang, Y.

2026-06-29 bioengineering 10.64898/2026.06.26.734804 medRxiv

Top 0.3%

3.9%

Show abstract

Considerable progress has been made in AI-driven RNA structure prediction, but the resulting models often lack complete atomic details or suffer from severe stereochemical distortions and incorrect local interactions. We present RNArefine, an AI-guided hierarchical framework for atomic-level RNA structure refinement. RNArefine first predicts base-pairing and base-stacking interactions using geometric attention networks and then integrates the interactions with physics-based force fields to guide a two-step refinement strategy consisting of Monte Carlo conformational sampling followed by L-BFGS energy optimization. Large-scale benchmark experiments on both sequence-based prediction models and cryo-EM-derived structures demonstrated that RNArefine consistently improves stereochemical quality, interaction fidelity and physically penalized structural accuracy while preserving global topology. When applied to blind CASP16 RNA prediction models, RNArefine improved ranking scores for 28 of the top 30 groups. These results establish RNArefine as a robust open-source framework for transforming raw RNA folds into physically realistic atomic models for downstream structural and therapeutic applications.

11

Comparative Modelling of Actin-Tropomyosin Interfaces

Menon, R.; BALASUBRAMANIAN, M.; Sowdhamini, R.

2026-07-10 bioinformatics 10.64898/2026.07.06.736648 medRxiv

Top 0.4%

3.2%

Show abstract

Tropomyosins are coiled-coil dimers that polymerize head-to-tail along actin filaments. They stabilize distinct filament populations and regulate the access of myosins and actin-binding proteins in both muscle and non-muscle contexts. Despite their central regulatory role, how filament length and isoform identity of different tropomyosin homologues might modulate actin affinity is not completely understood, especially across species. Here, we present a stepwise computational docking pipeline combining AlphaFold2-Multimer coiled-coil models, experimentally informed residue-level restraints, and pseudo-energy analysis via PPCheck to build and evaluate actin-tropomyosin co-polymer models for three isoforms: human TPM1 (hTPM1; 284 residues), human TPM4 (hTPM4; 248 residues), and Schizosaccharomyces pombe Cdc8 (SpCdc8; 161 residues). Interface energetics reveal a consistent hierarchy in which the shortest filament, SpCdc8, achieves the most stabilizing and residue-rich actin contacts, consistent with reduced cumulative geometric penalty along the actin helix. Among human isoforms, hTPM1 forms stronger interfaces with actin than hTPM4. The hTPM1-actin model also exhibits higher contact density and additional energetic hotspots, in agreement with the experimentally established slower exchange kinetics of TPM1 isoforms on actin filaments relative to TPM4. Hotspot mapping identifies conserved acidic residues at equivalent positions across all three isoforms, emphasizing the importance of electrostatic anchor points in maintaining interface integrity across diverse evolutionary contexts. Modeling of four temperature-sensitive SpCdc8 mutations (A18T, R21H, E31K and E129K) reveals that these substitutions substantially destabilize the coiled-coil dimer without significantly affecting actin interactions, suggesting that subtle regulatory failure arises from compromised longitudinal cable continuity rather than from direct loss of actin affinity. Taken together, our results support a hierarchical model of tropomyosin dimer stability, actin-tropomyosin recognition in which filament length imposes a geometric baseline on interface stability, onto which isoform-specific sequence evolution superimposes functional tuning. The tropomyosin homologues we studied appear to retain conserved electrostatic hotspots thereby providing a common structural scaffold across tissues and organisms.

12

Capabilities, specificity gaps and training-data dependence of AlphaFold3 across diverse application areas

Follonier, O.; Liu, Y.; Campomanes, P.; Lafrenaye, L.; Racle, J.; Alvarez, D.; van Gerwen, J.; Heinzmann, R.; Jänes, J.; Kummelstedt, E.; Durairaj, J.; Gfeller, D.; Vanni, S.; Beltrao, P.

2026-07-13 bioinformatics 10.64898/2026.07.13.738147 medRxiv

Top 0.5%

2.7%

Show abstract

Structure prediction models have moved from single proteins to assemblies that include diverse biomolecules and their modifications. AlphaFold3 (AF3) and related models extended structural modelling via an all-atom framework, opening many new potential applications in structural biology. We evaluate how well the new capabilities of AF3 translate into application tasks in diverse areas: prediction of ubiquitinated protein structures, T-cell receptor (TCR)-epitope recognition, antibody-antigen complexes, protein-RNA and protein-lipid interactions. We find that, while AF3 can perform well in favourable settings, this performance is uneven across applications. In RNA-target predictions, the model confidence fails to separate genuine from decoy interaction partners and in several tasks accuracy depends on the presence of related complexes in the training set. Taken together, our assessment is more cautious than for AF2, whose gains in modelling monomers and complexes were clear and broadly generalisable. AF3s extension to new biomolecule types shows less consistent performance and generalisation. AF3 can be a powerful tool for hypothesis generation and prioritisation, but its predictions and use of confidence metrics will depend strongly on the specific application area and must be interpreted with respect to training-set overlap. We expect that the benchmarks provided here will serve for testing of future developments in the structure prediction field.

13

DnaK refolds denatured proteins by actively pulling out their misfolded structural elements

Marszałek, O. K.; Marszalek, P. E.

2026-06-23 biophysics 10.1101/2025.09.22.677870 medRxiv

Top 0.5%

2.4%

Show abstract

DnaK, a prokaryotic Hsp70 chaperone, plays a central role in proteostasis by restoring native structures to heat-denatured proteins in an ATP-hydrolysis-dependent manner. While structures of DnaK in complex with nucleotides, co-chaperones, and short peptides have been resolved, structures with larger, stably folded substrates--such as firefly luciferase (Fluc, 61 kDa)--are lacking, limiting mechanistic understanding of how DnaK refolds such proteins. Here, we generated models of the DnaK-Fluc complex using AlphaFold3 and evaluated their mechanistic relevance. In one of three major model clusters, Fluc is unexpectedly immobilized beneath the DnaK -helical lid against the nucleotide-binding domain (NBD), rather than interacting primarily with the substrate-binding domain {beta} (SBD{beta}), as commonly assumed. All-atom molecular dynamics simulations indicate that, in this configuration, the lid can engage a thermally destabilized Fluc helix (residues 405-411), which we recently identified as the first--and likely the only--helix to irreversibly melt at 42 {degrees}C. Upon binding, the lid forms extensive hydrogen-bonding interactions with the melted helix. These interactions persist during lid movement toward SBD{beta} (following ATP hydrolysis), enabling the lid to actively extract the helix from the Fluc surface. In contrast, simulations with the helix in its native folded state show that the lid cannot extract it, leaving the native structure unaffected. Equilibrium simulations further indicate that, once extracted and mechanically stretched, the melted helix can refold to its native conformation. Together, these findings suggest a revised mechanism for DnaK-mediated protein refolding, in which the -helical lid selectively recognizes structurally compromised segments, forms stabilizing hydrogen bonds, and--powered by ATP hydrolysis--mechanically pulls them away from the protein surface to facilitate their refolding. SIGNIFICANCEDnaK is a model chaperone, which can reactivate thermally denatured proteins. Over the span of 40 years, significant findings have been made about DnaKs structure, dynamics and interactions with its co-chaperones, the exact molecular mechanism by which DnaK refolds misfolded proteins remains a mystery. This work exploited Alphafold3 to generate atomistic models of complexes between DnaK and Firefly luciferase. Molecular dynamics simulations directly captured how DnaK may assist thermally denatured proteins by mechanically pulling out their misfolded helices. This study provides a new insight into the DnaK mechanism.

14

Exploring the large-scale properties of a protein secondary structure genotype-to-phenotype map

Novev, J. K.; Schornack, S.; Ahnert, S. E.

2026-06-26 biophysics 10.64898/2026.06.26.734756 medRxiv

Top 0.5%

2.4%

Show abstract

We perform a large-scale computational characterization of the map of protein primary to secondary structure using an AVR3a class protein effector domain from the plant pathogen P. palmivora as a case study. We formulate a modified site-scanning approach for exploring the neutral component of secondary structure phenotypes based on predictions from the machine-learning algorithm Porter 5 and apply it to the AVR3a phenotype. We predict a set of sensitive sites within the effector domain that are generally located at or near the boundaries of structured regions, with restrictions on the possible amino acid residues at these sites dictated by the secondary structure type that they participate in within the WT. We characterize a set of mutated phenotypes derived through the exploration of the neutral component of the WT effector domain, selecting them so that they span a range including both very rarely and very commonly seen secondary structures, and that they include both secondary structures nearly identical to the WT and ones far removed from it. We find that all these diverse phenotypes have an estimated robustness of the same order as that of the WT, and that the robustness scales logarithmically phenotype frequency, as seen in other genotype-to-phenotype maps. Furthermore, we observe that the dependence of the estimated phenotype frequency on the Kolmogorov complexity indicates simplicity bias in the protein secondary structure map.

15

Computational Redesign of an Antifreeze Protein Using Deep Learning

Calia, C.; Altunc, A. J.; Eufemio, R. J.; Alvarado, B. O.; Huynh, J. D.; Oh, E.; Burkart, M.; Meister, K.; Paesani, F.

2026-06-24 biophysics 10.64898/2026.06.21.733612 medRxiv

Top 0.6%

2.1%

Show abstract

Antifreeze proteins (AFPs) found in various cold-adapted organisms inhibit ice growth and are of interest for applications in food products, cryopreservation, agriculture, and materials science. Although high-resolution structures are available for several AFPs, the amino acids required for full antifreeze activity remain incompletely defined, and the development of AFP variants with properties such as enhanced solubility, high expression yield, and improved thermostability may further facilitate applications. Here, we used the deep learning model ProteinMPNN to redesign the globular fish antifreeze protein AFPIII, keeping the previously reported ice-binding residues fixed. We readily obtained sequences confidently predicted to adopt AFPIIIs structure and we selected five designed variants for expression, all of which expressed efficiently in E. coli. Circular dichroism spectroscopy showed that two of these variants retained secondary structure elements consistent with AFPIII, whereas the other three exhibited structural differences. One design was predicted and experimentally confirmed to have increased thermostability. All five variants displayed measurable thermal hysteresis activity. However, none reached the activity of wild-type AFPIII, suggesting that maintaining the currently established set of ice-binding residues is not sufficient to fully preserve this AFPs function; other, unidentified residues can also impact its activity. Our findings highlight the value of deep learning-based protein design methods both for generating AFP variants with desirable properties and for uncovering gaps in existing knowledge of well-characterized AFPs.

16

A generalisable framework to inject distance information into Alphafold-like structure predictors

Mirabello, C.; Wallner, B.; Orekhov, V.; Nystedt, B.; Pearce, N.

2026-07-06 bioinformatics 10.64898/2026.07.02.736010 medRxiv

Top 0.6%

2.1%

Show abstract

Structure prediction methods are now highly successful at predicting three-dimensional structures from sequence. However, it is still often desirable to supplement these methods with additional external priors on pairwise distances in the structures. We present a general method for injecting prior information into AlphaFold-like structure predictors by biasing the pair representation to produce desirable features in the distogram, which are then reflected in the structures. We demonstrate this approach to: sample alternate states by selectively pushing or pulling mobile amino acid pairs; integrate NMR NOESY data with structure pre-diction; and improve the success of protein-protein and protein-ligand complex prediction. We demonstrate that this approach is applicable both to AlphaFold2 and a reproduction of AlphaFold 3 (OpenFold3). resTrain is open source, available to all users on GitHub and as a Colab notebook: https://github.com/clami66/resTrain

17

Evidence for lanthanide and PQQ dependent dehydrogenases in Eukarya

Robinson, C. M.; Martinez-Gomez, N. C.; West-Roberts, J. A.; Voutsinos, M. Y.; Banfield, J.

2026-07-14 bioinformatics 10.64898/2026.07.14.738520 medRxiv

Top 0.6%

2.0%

Show abstract

Lanthanides function as enzyme cofactors in bacteria, where they are widely distributed in pyrroloquinoline quinone-dependent 8-bladed beta-propeller dehydrogenases. No lanthanide-dependent enzymes, however, have been described outside prokaryotes. Here, we combined structural bioinformatics, phylogenetics, AlphaFold3 co-folding, coordination-sphere comparison, and quantum-mechanical cluster modeling to search for and rank putative lanthanide-coordinating 8-bladed beta-propeller enzymes in Eukarya. We identified candidate lanthanide-coordinating proteins in a diverse range of eukaryotes, predominantly plants and fungi, including species of clear industrial and agricultural relevance. A high-confidence subset matched validated bacterial Ln-binders based on both geometric similarity to canonical Ln-binding sites and on predicted Ln3+ versus Ca2+ selectivity. Our findings indicate that lanthanide biology likely extends beyond bacteria, with implications for plant, fungal, and broader eukaryotic metabolism, and warrant targeted biochemical investigation.

18

Structural Bioinformatics of Four Human Aquaporins and Their Water-Soluble QTY Analogs

Zhang, S.; Xiao, E.

2026-06-30 bioinformatics 10.64898/2026.06.24.734367 medRxiv

Top 0.7%

1.9%

Show abstract

Human aquaporins (AQPs) are essential membrane channels, yet their inherent hydrophobicity complicates structural and functional studies. We present the systematic application of the QTY code to human AQPs, integrating it with AlphaFold 3 structure prediction to design and validate that four-representative human AQPs (AQP1, AQP3, AQP4, AQP7) can be converted into water-soluble analogs while maintaining their conformation. This approach features a novel platform for editing challenging membrane proteins. The QTY code was applied to the transmembrane regions of the selected four AQPs. Subsequently, the water-soluble QTY analogs of the four AQPs were predicted using AlphaFold 3. The predicted structures were superposed with CyroEM- or X-ray-determined native structures in PyMOL. Further analyses included root-mean-square deviation (RMSD) calculations, visualization of hydrophobic surface reduction, and inspection of conserved protein-ligand binding ability. After applying the QTY code, sequence changes between native AQPs and their QTY analogs was significant (42.86-48.80%). Nevertheless, their structures superposed well in analyses, with only slight deviations (RMSD < 0.6 [A]). In addition, the surface hydrophobicity of all QTY-edited AQPs was significantly reduced. Importantly, molecular contacts between the cholesterol ligand and protein were largely preserved for both native AQP1 and its QTY analog. Finally, all AlphaFold3-predicted structures for AQPs have high confidence values (pLDDT > 90; pTM ~0.83), supporting the reliability of the predicted structures. The findings demonstrate that membrane protein hydrophobicity can be edited and reduced without compromising fold integrity or functional architecture. Integration of the QTY code with AlphaFold 3 affords a high-throughput platform for designing water-soluble, structurally faithful analogs of challenging membrane proteins. Such a strategy can provide a potent platform for detergent-free biochemical studies and water-soluble analogs for therapeutic monoclonal antibody discoveries, thus advancing research of this pharmacologically important protein family.

19

BATTLE-AMP: Benchmarking Antimicrobial Peptide Predictors

Szymczak, P.; Bukała, A.; Zarzecki, W.; Sala, M.; Borisek, J.; Fadavi, S.; Olayo-Alarcon, R.; Sroka, J.; Colome-Tatche, M.; Gambin, A.; L. Müller, C.; Setny, P.; Szczurek, E.

2026-06-24 bioinformatics 10.64898/2026.06.19.733349 medRxiv

Top 0.7%

1.8%

Show abstract

As antimicrobial resistance outpaces antibiotic development, antimicrobial peptides (AMPs) have emerged as a promising class of alternative antibacterials, and computational predictors are increasingly used to prioritize AMP candidates. Such predictors are typically evaluated on binary AMP/non-AMP classification, which does not test whether they can identify peptides with clinically relevant potency against specific pathogens. We present BATTLE-AMP, a benchmarking framework that evaluates AMP predictors against experimentally measured minimum inhibitory concentrations (MICs) across clinically relevant bacterial species and strains. We surveyed 48 published methods, finding fewer than 25% reproducible, and benchmarked 10 model families (21 variants) using experimental MIC data, synthetic sequence perturbations, activity cliff analyses, and all-atom molecular dynamics (MD) simulations. Four findings emerge: (i) models trained on MIC data outperform binary classifiers regardless of architecture; (ii) the best model depends on the target pathogen, so model selection must be guided by the biological question; (iii) most models cannot distinguish active peptides from inactive sequences with identical amino acid composition; and (iv) activity cliffs remain unresolved by both machine learning and MD, marking a limit of current computational methods. BATTLE-AMP is released as an open Snakemake framework at https://github.com/szczurek-lab/battleamp-snakemake for benchmarking new models and scoring novel candidate libraries.

20

Prosculpt: Lowering the Barrier to Computational Protein Design

Olivieri, F.;Konstantinova, A.;Ribnikar, N.;Bizjak, N.;Žnidar, ?.;Abel, K.;Rajh, E.;Ljubetič, A.

2026-06-26 Synthetic Biology 10.64898/2026.06.25.732351 medRxiv

Top 0.7%

1.7%

Show abstract

Over the past decade, protein design has evolved from a specialized discipline into a broadly accessible approach for engineering and interrogating biological systems. Despite these advances, protein design continues to be a technically challenging task, often requiring knowledge of programming to be able to use and combine the different software packages. To address this challenge, we have developed Prosculpt, an easy-to-use protein design pipeline. Prosculpt integrates RFdiffusion for backbone generation, ProteinMPNN for sequence design and multiple structure-prediction platforms (AF2, AF3, Colabfold, Boltz2). Candidate designs are evaluated using customizable Rosetta-based scoring protocols. Each project is specified through a single configuration file, enabling users with minimal computational expertise to perform sophisticated protein design tasks without writing code, while also allowing advanced users to access the full capabilities of the underlying programs. Prosculpt supports a wide range of applications, including design of symmetric homo-oligomers, design of binders, motif scaffolding, partial diffusion and fixed-backbone sequence redesign. By combining these capabilities within a single, user-friendly platform, Prosculpt provides a practical entry point to modern protein design for both novice and expert users.